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Abstract 

Redundancy is a commonly applied reliability improvement technique to enhance the system reliability and availability 
of safety critical systems, or operational impact systems in the railroad and mass transit industry. In this paper, two very 
basic but different types of parallel redundancy, namely active redundancy and standby redundancy are introduced and 
studied according to the mechanism structure built in a system. The pros and cons of the active redundancy and standby 
redundancy are also discussed. The Markov model technique is utilized to illustrate the Mean Time Between Failure 
(MTBF) calculation for the active and standby redundancy for the purpose of reliability evaluation. The comparison is 
also undertaken for the active redundancy versus standby redundancy from a reliability point of view. 
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1. Introduction 

In the railroad and mass transit industry, the safety critical functions or operational impact systems 
require redundancy implementation to enhance the system safety and strengthen systems reliability 
and availability. Redundancy is defined as the existence of more than one means for accomplishing 
a given task or function in a system. 

One thing that should be revealed is that the application of redundancy is not without penalties. 
Despite reducing system mission failures, redundancy increases system logistics failures. It will 
also increase weight, space requirements, complexity, cost, and time to design. The increase in 
complexity results in an increase in unscheduled maintenance. Thus, system safety and mission 
reliability is gained at the expense of adding an item(s) in the unscheduled maintenance chain. The 
increase in unscheduled maintenance may be counteracted by reliability improvement techniques 
such as design simplification, derating, and the use of more reliable components. 

By incorporating redundancy in a system design, the “checkability” or diagnostic coverage must 
also be considered. The status of some items may not be checkable prior to the mission start. Such 
items will then be assumed to be functional at the beginning of the mission. In reality, pre-mission 
failures of redundant items could be disguised. If it is not known that redundant elements are 
operational prior to mission start, then the purpose of redundancy can be defeated because the 
possibility exists of starting a mission without the designed redundancy (a reliability loss). 

The two basic types of commonly applied redundancy are active redundancy and standby 
redundancy. Active redundancy does not require the external components or devices to perform the 
function of detection, decision and switching when an element or path in the redundant structure 
fails. The redundant elements are always in operation to share the load of the system, and 
automatically pick up the load for a failed element. Active redundancy is also called Full-on 
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redundancy or Load-sharing redundancy in other papers. Fig. 1 shows an active redundant system 
configuration. 




Fig. 1. Active redundancy 

Standby redundancy is defined as the redundancy that requires the external elements or devices to 
detect, make a decision and switch to another element or path as a replacement for a failed element 
or path. Standby units can be operating (hot standby) or inactive (cold standby). Hot standby and 
active redundancy can be considered identical if the switching device is perfect. Fig. 2 shows a 
standby redundant system configuration. 



Fig. 2. Standby redundancy 

In reference Military Standard (2005), the concept of active redundancy and standby redundancy 
were introduced. In reference Mok et al. (2013), types of redundancies including active redundancy 
and standby redundancy were presented. In reference Mohammad et al. (2013), a load-sharing 
systems using k-out-of-/z structure were presented. Active redundancy is a 1 -out-of-2 load-sharing 
system. 

In the reliability engineering practice, when we make a decision to use redundant design techniques 
to improve system reliability and availability. We usually confront a fundamental question: what 
type of redundancy is more appropriate to achieve required system reliability and availability? 
Active redundancy or standby redundancy. In this paper, we will perform a reliability analysis to 
compare the active redundancy against the standby redundancy by utilizing Markov model 
technique. The conclusion will be summarized at the end of this paper. 

2. Markov Model 

The term “Markov model” is named after the Russian mathematician Andrei Markov, originally 
referred to mathematical models in which the future state of a system depends only on its current 
state, and not on its past history. That is the memory less characteristic, which is the main Markov 
property. The other characteristic of Markov model is stationary. A stationary system is one in 
which the probabilities which govern the transitions from state to state remain constant with time 
(i.e. constant failure rate or repair rate). For any given system, a Markov model consists of a list of 
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the possible states of that system, the possible transition paths between those states, and the rate 
parameters of those transitions. 

Markov model is a very useful and powerful reliability analysis tool to evaluate the redundant 
systems which have the constant failure rate and repair rate. In reference Klion (1977), Markov 
approaches for full on operation and stand-by operation were introduced. In reference Military 
Standard (2005), Markov theory was introduced. In reference Jackson (2013), Markov analysis 
with non-constant hazard rates was presented. In reference Dakic (2015), Markov model was 
presented as one of the deductive methods of reliability quantification methods and techniques. In 
this paper, we will utilize Markov model to measure the reliability parameter Mean Time Between 
Failure (MTBF) for the active redundant system and standby redundant system respectively. The 
comparison will be undertaken between the active redundant system and the standby redundant 
system based on the reliability parameter evaluation. 

3. Reliability Evaluation for Active Redundancy 

In order to utilize the Markov model to analyze an active redundant system, a state transition 
diagram is illustrated in Fig. 3. 




Fig. 3. State transition diagram for active redundancy 

In the above state transition diagram, state one is the initial state where unit A and unit B are both 
operating properly. State two is the state where one unit has failed, the remaining unit is still 
working to keep the system operational (success). System only fails if both unit A and unit B fail 
to meet the system operational requirement. State three is reached when unit A and B have both 
failed. An assumption used in developing the state transition diagram is that unit A and unit B 
cannot change states simultaneously. In Fig. 3 X is the unit failure rate and |J is the unit repair rate. 

For state one: 

The probability of being in state one at time t+At is equal to the probability of being in state one at 
time t and not transitioning out during At. This can be written as 

P x (t + At) = P x (t) ■ (1 - 2/1 • At) + /j ■ At ■ P 2 (t) (1) 
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Rearranging by moving P\(t) from the right-hand side to left-hand side, and dividing At on the both 
sides of equation (1) to obtain equation (2) 


P.it + At) - P. (t) _ dP.it) 
At dt 


-2 A ■ P.(t) + n ■ P 2 (t) 


( 2 ) 


By integrating equation (2), we obtain 


/•oo poo poo 

dP.it) = - 22 P.it)dt + ju\ P 2 it) (3) 

JO JO JO 

P.i^)-P.iO) = -2A-T. +ju-T 2 (4) 

Where, the boundary condition 
P. (oo) = 0, Pi (0) = 1. 

Note that the boundary condition is equal to one at the state of P\ (0) or P] (oo), and zero at all other 
states. 


T\ is defined as the expected time in state one; 73 is defined as the expected time in state two. 

1 = 2 A-T.-juT 2 (5) 


For state two: 

The probability of being in state two at time t+At is equal to the probability of being in state one at 
time t and transitioning to state two in At plus the probability of being in state two at time t and not 
transitioning out during At. This can be written as 

P 2 it + At) = P. it) ■ 22 ■ At + P 2 it) ■ [l - (2 + ju) ■ A/] (6) 

Rearranging by moving Pi it) from the right-hand side to left-hand side, and dividing At on the both 
side of equation (6) to obtain equation (7) 


P 2 it + At) - P 2 it) 
At 


dP 2 it) 
dt 


22 • P.it) - (A + //) • P 2 it) 


By integrating equation (7), we obtain 


poo poo poo 

dP 2 it) =22 P.{t)dt - (2 + p)\ P 2 it)dt 

JO JO JO 

P 2 (oo)- P 2 (0) = 22-7) -(2 + //)-r 2 


Substituting 7) in equation (10) into equation (5) to obtain 


(7) 


( 8 ) 

(9) 

( 10 ) 
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Here, the success of the system is defined by state one and state two. State three is the failed 
condition. Consequently, we can write the MTBF. The MTBF would be defined as the sum of the 
expected time in state one and state two. Mathematically, this can be written as 


MTBF =T,+T 2 = 3A + fJ 

! 2 . o -J 2 


( 12 ) 


2X~ 


If the system is not maintained or non-repairable, then removing fi from the equation (12) and is 
simplified as 


MTBF - — 

2/L 


(13) 


4. Reliability Evaluation for Standby Redundancy 

Standby redundancy is more complicated than active redundancy because the switching device is 
involved to detect the failed primary unit and turn on the standby unit. The failure of switching 
device will result in different consequence for the system operation. If the switching device operates 
properly, it detects the failed primary unit and turns on the standby unit. The system operates until 
the standby unit fails. If the switching device fails while the primary unit is operating, the system 
operates until the primary unit fails. If the switching device fails in a way that a switch to the 
standby unit is mandated, while the primary unit is still capable of operating. The standby unit is 
turned on and the system operates until the standby unit fails. If the switching device fails while 
the primary unit is still operating, it fails in such a way that the primary and standby units are unable 
to operate and the system fails. 

Considering the complexity introduced by the switching device, in this paper we assume that the 
switching device is always operating until the system fails. In other words, the failure of the 
switching device is not taken into account in the reliability analysis performed below. 

In order to utilize the Markov model to analyze a standby redundant system, again, a state transition 
diagram is illustrate in Fig. 4. 
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Fig. 4. State transition diagram for standby redundancy 

In the above state transition diagram, state one is the initial state where unit A is operating as a 
primary unit and unit B is not operating as standby. State two is the state where unit A has failed, 
the switching device detects the failure of primary unit A, and turn on the standby unit B to keep 
the system operational (success). State three is the state when the primary unit A and the standby 
unit B have both failed. In Fig. 4 X is the unit failure rate and fi is the unit repair rate. 

For state one: 

The probability of being in state one at time t+At is equal to the probability of being in state one at 
time t and not transitioning out during At. This can be written as 

P x (t + At) — P x (t) ■ (1 - A ■ At) + ju- At ■ P 2 ( t) (14) 


Rearranging by moving P\(t) from the right-hand side to left-hand side, and dividing At on the both 
sides of equation (14) to obtain 


P x (t + At) - P x (t) _ dP x {t) 
At dt 


-A ■ P x (t) + ju- P 2 (t) 


(15) 


By integrating equation (15), we obtain 


/•00 poo poo 

dP x (t )=-/l P x (t)dt + ju\ P 2 (t) (16) 

JO JO JO 

P 1 (cc)-P 1 (0) = -A-T 1 +/u-T 2 (17) 

Where, the boundary condition is Pi(oo) = 0, Pi(0) = 1. 

Note that the boundary condition is equal to one at the state of Pi(0) or Pi(c o), and zero at all other 
states. 
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T\ is defined as the expected time in state one; 73 is defined as the expected time in state two. 

1 = A-T l -/u-T 2 (18) 


For state two: 

The probability of being in state two at time t+At is equal to the probability of being in state one at 
time t and transitioning to state two in At plus the probability of being in state two at time t and not 
transitioning out during At. This can be written as 

P 2 (t + At) — P 1 (t) ■ A ■ At + P 2 ( t ) • [l - (A + ju) ■ A/] (19) 

Rearranging by moving Piit) from the right-hand side to left-hand side, and dividing At on both 
sides of equation (19) to obtain equation (20) 


P 2 (t + At) - P 2 (t) 
At 


dP 2 ( t) 
dt 


A-P x (t)-(A + ju)-P 2 (t) 


( 20 ) 


By integrating equation (20), we obtain 


poo poo poo 

dP 2 (t) =A P ] ( t)dt - {A + //) P 2 ( t)dt 

JO JO JO 

P 2 ( oo) - P 2 (0) = A ■ T x - (A + JU) ■ T 2 

t 1 = ALE. Ti 
1 A. 2 


( 21 ) 

( 22 ) 

(23) 


Substituting 7) in equation (23) into equation (18) to obtain 


}_ T _ ^ + tL 
A ’ 1 “ A 2 


(24) 


Here, the success of the system is defined by state one and state two. State three is the failed 
condition. Consequently, we can write the MTBF. The MTBF would be defined as the sum of the 
expected time in state one and state two. Mathematically, this can be written as 


MTBF = T, + T 2 = 


(25) 


If the system is not maintained or non-repairable, then the equation (25) is simplified as 


MTBF - — 
A 


( 26 ) 


5. Comparison and Conclusion 
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Based on the above performed reliability assessment and analysis, both active redundancy and 
standby redundancy will improve the system mission reliability and availability, and prolong the 
system operating time. For the repairable systems or the systems which are maintained, the 
reliability improvement is significant compared to the non-repairable systems or the systems which 
are not maintained. Mathematically, the reliability improvements for the active redundancy and the 
standby redundancy are very close based on the calculated MTBF comparison. More precisely the 
calculated MTBF for the standby redundancy shows slightly better than the active redundancy. 
Notwithstanding the switching device in the standby system will increase the complexity to the 
standby redundant system. Additionally the failure of the switching device will degrade the mission 
reliability and availability for the standby redundant system. Therefore, system engineers should 
consider the manifold factors including cost, complexity, maintainability, space, checkability, 
failure rate of unit and switching device, failure consequence and safety impact etc., and decide 
which of the redundancy technique is more appropriate to achieve the intended system mission 
reliability requirement based on analysis of the tradeoffs involved. 
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